SLIQ: A Fast Scalable Classifier for Data Mining
نویسندگان
چکیده
Classification is an important problem in the emerging field of data mining. Although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classifier and presents the design of SLIQ’ , a new classifier. SLIQ is a decision tree classifier that can handle both numeric and categorical attributes. It uses a novel pre-sorting technique in the tree-growth phase. This sorting procedure is integrated with a breadth-fist tree growing strategy to enable classification of disk-resident datasets. SLIQ also uses a new tree-pruning algorithm that is inexpensive, and results in compact aad accurate trees. The combination of these techniques enables SLIQ to scale for lerge data sets and classify data sets irrespective of the number of classes, attributes, and examples (records), thus making it an attractive tool for data mining.
منابع مشابه
Sliq: a Fast Scalable Classiier for Data Mining
Classiication is an important problem in the emerging eld of data mining. Although classiication has been studied extensively in the past, most of the classiication algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classi-er and presents the design of SLIQ 1 , a new classiier...
متن کاملSLIQ: A Fast Scalable Classi er for Data Mining
Classi cation is an important problem in the emerging eld of data mining. Although classi cation has been studied extensively in the past, most of the classi cation algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classier and presents the design of SLIQ, a new classi er. SL...
متن کاملCC-SLIQ: Performance Enhancement with 2k Split Points in SLIQ Decision Tree Algorithm
Decision trees have been found to be very effective for classification in the emerging field of data mining. This paper proposes a new method: CC-SLIQ (Cascading Clustering and Supervised Learning In Quest) to improve the performance of the SLIQ decision tree algorithm. The drawback of the SLIQ algorithm is that in order to decide which attribute is to be split at each node, a large number of G...
متن کاملSLEAS: Supervised Learning using Entropy as Attribute Selection Measure
There is embryonic importance in scaling up the broadly used decision tree learning algorithms to huge datasets. Even though abundant diverse methodologies have been proposed, a fast tree growing algorithm without substantial decrease in accuracy and substantial increase in space complexity is essential to a greater extent. This paper aims at improving the performance of the SLIQ (Supervised Le...
متن کاملAn Approach to Automation Selection of Decision Tree based on Training Data Set
In Data mining applications, very large training data sets with several million records are common. Decision trees are very much powerful and excellent technique for both classification and prediction problems. Many decision tree construction algorithms have been proposed to develop and handle large or small training data. Some related algorithms are best for large data sets and some for small ...
متن کامل